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Abstract —This paper constructs WOM codes that combine 
rewriting and error correction for mitigating the reliability and 
the endurance problems in flash memory. We consider a rewriting 
model that is of practical interest to flash applications where only 
the second write uses WOM codes. Our WOM code construction 
is based on binary erasure quantization with LDGM codes, where 
the rewriting uses message passing and has potential to share the 
efficient hardware implementations with LDPC codes in practice. 
We show that the coding scheme achieves the capacity of the 
rewriting model. Extensive simulations show that the rewriting 
performance of our scheme compares favorably with that of 
polar WOM code in the rate region where high rewriting success 
probability is desired. We further augment our coding schemes 
with error correction capability. By drawing a connection to the 
conjugate code pairs studied in the context of quantum error 
correction, we develop a general framework for constructing 
error-correction WOM codes. Under this framework, we give 
an explicit construction of WOM codes whose codewords are 
contained in BCH codes. 

I. Introduction 

Flash memory has become a leading storage media thanks 
to its many excellent features such as random access and high 
storage density. However, it also faces significant reliability 
and endurance challenges. In flash memory, programming 
cells with lower charge levels to higher levels can be done 
efficiently, while the opposite requires erasing the whole 
block containing millions of cells. Block erasure degrades 
cell quality, and current flash memory can survive only a 
small number of block erasures. To mitigate the reliability and 
the endurance issues, this paper studies write-once memory 
(WOM) codes that combine erasure-free information rewriting 
and error correction. 

WOM was first studied by Rivest and Shamir ifTSl . In the 
model of WOM, new information is written by only increasing 
cell levels. Compared to traditional flash, WOM-coded flash 
achieves higher reliability when the same amonut of informa¬ 
tion is written, or writes more information using the same 
number of program/erase (P/E) cycles. We illustrate these 
benefits using Fig. [T] where we show the bit error rates (BERs) 
of the first write and the next rewrite measured for the scheme 
of this paper in a 16nm flash chip. When using the standard 
setting for error correcting codes (ECCs), flash memory can 
survive 14000 P/E cycles without an ECC decoding failure. 
Using a code constructed in this paper that allows user to 
write 35% more information, we only need 10370 P/E cycles 
to write the information. Notice that the raw BER at 10370 
P/E cycles is much lower than that at 14000 P/E cycles, hence 
ECC decoding will have much lower failure rate, which leads 
to higher reliability. On the other hand, if we use WOM until 
ECC fails at 14000 P/E cycles, the total amount information 
that is written requires 18900 P/E cycles to write in traditional 
flash. WOM codes can also be used for scrubbing the memory. 
In this use, the memory is read periodically, to correct errors 
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Fig. 1. The raw BERs when using the proposed rewriting scheme. 

that were introduced over time. The errors are corrected using 
an ECC, and the corrected data is written back using a WOM 
code (see my Many WOM constructions were proposed 
recently. Codes with higher rates were discovered mica, 
and codes that achieve capacity have also been found |[T|. In 
this paper, we propose an alternative construction of WOM 
codes. Our scheme differs from the WOM codes mentioned 
above mainly in two aspects. First, we focus on a specific 
rewriting model with two writes, where only the second write 
uses WOM codes. Such rewriting scheme has no code rate 
loss in the first write, and recent experimental study has 
demonstrated its effectiveness on improving the performance 
of solid state drives Il22l . Note that, the model of this rewriting 
scheme is not only an instance of the general WOM model ||8l, 
but also an instance of the model studied by Gelfand and 
Pinsker 0. Second, our construction is based on binary era¬ 
sure quantization with low-density-generator-matrix (LDGM) 
codes. The encoding is performed by iterative quantization 
studied by Martinian and Yedidia IfTSlI . which is a message¬ 
passing algorithm similar to the decoding of low-density- 
parity-check (LDPC) codes. As LDPC codes have been widely 
adopted by commercial flash memory controllers, the hardware 
architectures of message-passing algorithms have been well 
understood and highly optimized in practice. Therefore, our 
codes are implementation-friendly for practitioners. Extensive 
simulations show that the rewriting performance of our scheme 
compares favorably with that of the capacity-achieving polar 
WOM code m in the rate region where a low rewriting failure 
rate is desired. For instance, we show that our code allows 
user to write 40% more information by rewriting with very 
high success probability. We note that the iterative quantization 
algorithm of im was used in m in a different way for 
the problem of information embedding, which share some 
similarity with our model. 

Moreover, our code construction is extended with error 
correction. The need for error correction is observed in our 
experiments. As shown in Fig. [T] the BERs of both writes 








increase rapidly with the number of block erasures. Con¬ 
structions of error-correcting WOM codes have been studied 
in recent literature. Error-correcting WOM codes have been 
proposed in El a M ED EH. Different from the existing 
constructions above, we use conjugate code parrs studied in 
the context of quantum error correction 0. As an example, 
we construct LDGM WOM codes whose codewords also 
belong to BCH codes. Therefore, our codes allows to use any 
decoding algorithm of BCH codes. The latter have been im¬ 
plemented in most commercial flash memory controllers. We 
also present two additional approaches to add error correction, 
and compare their performance. 

11. Rewriting and Erasure Quantization 

A. Rewriting Model 

We consider a model that allows two writes on a block of 
n cells. A cell has a binary state chosen from {0,1}, with the 
rewriting constraint that state 1 can be written to state 0, but 
not vice versa. All cells are initially set to be in state 1, and 
so there is no writing constraint for the first write. A vector 
is denoted by a bold symbol, such as s = (si, S 2 ,. .. ,s„). 
The state of the n cells after the first write is denoted by the 
vector s. We focus only on the second write, and we assume 
that after the first write, the state of the cells is i.i.d., where 
for each i, Pr{s, = 1} = [i. We note that the special case 
of = 1/2 is of practical importance, since it approximates 
the state after a normal page programming in flash memorjQ. 
The second write is concerned with how to store a message 
MI G F 2 by changing s to a new state x such that 1) the 
rewriting constraint is satisfied, and 2) x represents m. This 
is achieved by the encoding operation of a rewriting code, 
defined formally in the following. 


B. Binary Erasure Quantization 

The BEQ problem is concerned with the quantization of a 
binary source sequence s', for which some bits are erased. 
Eormally, s' G {0,1,*}”, where * represents erasures, s' 
needs to be quantized (compressed) such that every non-erased 
symbol of s' will maintain its value in the reconstructed vector. 
A reconstructed vector with such property is said to have no 
distortion from s'. In this paper we use linear BEQ codes, 
defined as follows: 

Definition 2. A linear BEQ code Cq is a subspace of F^. Each 
c G Cq is called a codeword of Cq. The dimension of Cq is 
denoted by r. 

Each codeword of Cq is labeled by a different r-bits sequence 
u. Given a BEQ code Cq and a source sequence s', a 
quantization algorithm Q is invoked to find a label u whose 
codeword c G Cq has no distortion from s' . If such label 
is found, it is denoted by m = Q(s'), and is considered as 
the compressed vector. Otherwise, a quantization failure is 
declared, and Q{s') = Failure. The reconstruction uses a 
generator matrix Gq of Cq to obtain the codeword c = uGq. 

C. Reduction from Rewriting to Erasure Quantization 

In this subsection we show that the problem of rewriting 
can be efficiently reduced to that of BEQ. Let Cq be a linear 
quantization code, and let H be a parity-check matrix of Cq. 

Constructions. A rewriting code Cr is constructed as the 
collection of all cosets of Cq in F^. A decoding function for 
Cr is defined by a parity check matrix H of Cq, such that a 
vector X G F^ is decoded into its syndrome 

DECh{x) = xH^. (1) 


Definition!. A rewriting code Cr is a collection of disjoint 
subsets ofF’j. 

Each element of Cr corresponds to a different message. 
Consider M G Cr that corresponds to a message mi, then 
for all X G M, we say that x is labeled by mi. The decoding 
function maps the set of labeled vectors into their labels, which 
are also the messages. To encode a message mi given a state 
s, the encoder needs to find a vector x with label mi that can 
be written over s. If the encoder does not find such vector x, 
it declares a failure. The rewriting rate of Cr is defined by 
^^WOM = ^/n. The rewriting capacity, which characterizes 
the maximum amount of information that can be stored per 
cell in the second write, is known to be f bits jQ. 

We are interested in rewriting codes with rates close to 
the capacity, together with efficient encoding algorithms with 
low failure probability. The main observation in the design 
of the proposed rewriting scheme of this paper is that the 
rewriting problem is related to the problem of binary erasure 
quantization (BEQ), introduced in the next subsection. 

*In flash memory, the message to be written can be assumed to be random 
due to data compression and data randomization used in memory conttollers. 


Since the dimension of Cq is r, it has 2”“'’ cosets. Therefore 
the rate of Cr is Rwom = implying that k = n — 

r. We define some notation before introducing the reduction 
algorithm. Let be a left inverse for , meaning that 

is the k x k identity matrix. Define a function 
BEC : {0,1}” X {0,1}” ^ {0,1,*}” as: 


BEC{w,v)i 


Wi if Vi = 0 
* if Vi = 1 


,Vi = 1,...,M 


BEC{w,v) realizes a binary erasure channel that erases 
entries in w whose corresponding entries in v equal 1. We 
are now ready to introduce the encoding algorithm for the 
rewriting problem. 


Theorem4. Algorithm\Tl either declares a failure or returns a 
vector X such that x is rewritable over s and xH^ = mi. 

Proof: Suppose failure is not declared and x is returned 
by Algorithm [D We first prove that x is rewritable over s. 
Consider i such that s, = 0. Then it follows from the definition 
of BEC that s[ = Z;. Remember that Q(s') returns a label 
II such that c = mGq has no-distortion from s'. Therefore, 
Ci = s' = Zi, and x, = c/ + z, = Z/ + z, = 0 = s} So x 


Algorithm 1 x = ENC(GQ,m, s): Encoding for Rewriting 
1 : z <— 

2 : s' <— BEC{z,s) 

3: M •(- Q(s') 

4: if M = FAILURE then 
5: return FAILURE 

6: else 

7: return x <r- uGq + z 

8: end if 


can be written over s. To prove the second statement of the 
theorem, notice that 

xH'^ = {uGq + z)H^ = uGqH^ + 

= Ef- = m. 

■ 

III. Rewriting with Message Passing 

In this section we discuss how to choose a quantization code 
Cq and quantization algorithm Q to obtain a rewriting scheme 
of good performance. Our approach is to use the iterative 
quantization scheme of Martinian and Yedidia Qa, where Cq 
is an LDGM code, and Q is a message-passing algorithm. This 
approach is particularly relevant for flash memories, since the 
hardware architecture of message-passing algorithms is well 
understood and highly optimized in flash controllers. 

The algorithm Q can be implemented by a sequential or 
parallel scheduling, as described in ifTSl Section 3.4.2]. For 
concreteness, we consider the sequential algorithm denoted 
by ERASURE-QUANTIZE in ifTSl . Since the performance of 
ERASURE-QUANTIZE depends on the chosen generator ma¬ 
trix, we abuse notation and denote it by Q{Gq,s'). Algorithm 
Q(Gq,s') is presented in Appendix I aI for completeness. 

Finally, we need to describe how to choose a generator 
matrix Gq that work well together with Algorithm Q. We 
show next that a matrix Gq with good rewriting performance 
can be chosen to be a parity-check matrix that performs 
well in message-passing decoding of erasure channels. This 
connection follows from the connection between rewriting and 
quantization, together with a connection between quantization 
and erasure decoding, shown in GS). These connections imply 
that we can use the rich theory and understanding of the 
design of parity-check matrices in iterative erasure decoding, 
to construct good generating matrices for rewriting schemes. 
To make the statement precise, we consider the standard 
iterative erasure-decoding algorithm denoted by ERASURE- 
DECODE(H, 1/) in IITSl . where H is an LDPC matrix and y 
is the output of a binary erasure channel. 

Theorems. For all m e and z',s e Fj, ENC{GQ,m,s) 
fails if and only if ERASURE-DECODE{Gq, BEC{z', s + 
In)) fails, where In is the all-one vector of length n. 

The proof of Theorem |5] is available in Appendix |B] The 
running time of the encoding algorithm ENC is analyzed 
formally in the following theorem. 
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Fig. 2. Rewriting failure rates of polar and LDGM WOM codes. 

Theorem 6. The algorithm ENC{Gq, m, s) runs in time 0{nd) 
where n is the length of s and d is the maximum degree of the 
Tanner graph of Gq. 

The proof of Theorem |6] is available in Appendix Theo- 
rems|5]and|6] together with the analysis and design of irregular 
LDPC codes that achieve the capacity of the binary erasure 
channel ini, imply the following capacity-achieving results. 

Corollary 7. There exists a sequence of rewriting codes which 
can be efficiently encoded by Algorithm [T] and efficiently 
decoded by Equation O that achieves the capacity of the 
rewriting model fi. 

The proof of Corollary |7] is available in Appendix [Dl 

The finite-length performance of our rewriting scheme 
is evaluated using extensive simulation with the choice of 
P = 0.5 and Gq to be the parity-check matrix of a Mackay 
code m. The rewriting failure rates of our codes with lengths 
n = 8000 and 16000 that are relevant to flash applications are 
compared with those of the polar WOM codes of lengths 2^^ 
and 214 iji Pig |2] shows the rewriting failure rates of both 
codes at different rewriting rate, where each point is calculated 
from 10® experiments. Remember that the capacity of the 
model is 0.5. The results suggest that our scheme achieves 
a decent rewriting rate (e.g. 0.39) with low failure rate (e.g. 
< 10~4). Moreover, our codes provide significantly lower 
failure rates than polar WOM codes when the rewriting rate 
is smaller, because of the good performance in the waterfall 
region of message-passing algorithm. 

IV. Error-Correcting Rewriting Codes 

The construction of error-correcting rewriting codes is based 
on a pair of linear codes (Ci, Cq), that satisfies the condition 
Cl D Cq, meaning that each codeword of Cq is also a 
codeword of Ci. Define C 2 to be the dual of Cq, denoted 
by C 2 = Cq. a pair of linear codes (Ci,C 2 ), that satisfies 
Cl D C 2 is called a conjugate code pair, and it is useful 
in quantum error correction and cryptography Q- For the 
flash memory application, we let Ci be an error-correction 
code, while C^ = Cq is a BEQ code. The main idea in the 
construction of error-correcting rewriting codes is to label only 
the codewords of Ci, according to their membership in the 
cosets of Cq. The construction is defined formally as follows; 














Construction 8. For c G C-[, let c + Cq be the coset of Cq in 
Cl that contains c. Then the error-correcting rewriting code is 
constructed to be the collection of cosets of Cq in Ci- 

Next we define the matrices and to be used in 

encoding and decoding. Let Gi and Gq be generator matrices 
of the codes Ci and Cq, respectively, such that each row of 
Gq is also a row of Gi. Since Ci contains Cq, such matrix 
pair always exists. Define to be constructed by the 

rows of Gi that are not rows of Gq. Let be a right inverse 
of 

The encoding is performed according to Algorithm [T] with 
the matrix defined above. Note that in Step 1, z is a 

codeword of Ci, since each row of is also a row of Gi. 

In addition, in Step 7, mGq is also a codeword of Ci (unless 
Q(Gq, s') fails), since Cq is contained in Ci. Therefore, x = 
uGq + 2 is a codeword of Ci. The decoding can begin by the 
recovery of x from its noisy version, using the decoder of Ci. 
The message m can then be recovered by the product xH^. 

A similar framework was described in nsi, which proposed 
a construction of a repetition code contained in a Hamming 
code, with a Viterbi encoding algorithm. In this paper we make 
the connection to the quantum coding literature, which allows 
us to construct stronger codes. 

A. Conjugate Codes Construction 

We look for a conjugate pair (Ci,C 2 ) such that Ci is 
a good error-correcting code, while is a good LDGM 
quantization code. Theorem |5] implies that C 2 needs to be an 
LDPC code with a good performance over a binary erasure 
channel (under message passing decoding). Constructions of 
conjugate code pairs in which C 2 is an LDPC code are studied 
in laEnm. Sarvepalli et al. m construct a pair of codes 
such that Cl is a BCH code and C 2 is a Euclidean geometry 
LDPC code, which is particularly useful for our purpose. 
This is because BCH codes are used extensively for error 
correction in flash memories. Below we hrst briefly review 
the construction of Euclidean geometry LDPC codes and then 
discuss the application of the results in lfT9l to our settings. 

Denote by EG(m, p®) the Euclidean hnite geometry over 
Fps consisting of p"*® points. Note that this geometry is equiv¬ 
alent to the vector space F™s. A p-dimensional subspace of 
F^s or its coset is called a j,i-flat. Let / be the number of p-flats 
that do not contain the origin, and let tti, ...Xpsm_i be the points 
of EG(m, p®) excluding the origin. Construct a / x p®™ — 1 
matrix FI^q in the way that its (f,/)-th entry equals 1 if the 
f-th p-flat contains Xj, and equals 0 otherwise. Hfg is the 
parity check matrix of the (Type-I) Euclidean geometry LDPC 
code C£Q{m, ft, s, p). C£Q{rn, pi,s,p) is a cyclic code and by 
analyzing the roots of its generator polynomial, the following 
result is obtained im. 

Proposition9 C^Q{ni, p,s, p) is contained in a BCFl code of 
design distanced = p^® — 1. 

Hence we may choose C 2 to be C£Q{m,ii,s,p) and Ci to 
be a BCH code with distance equal to or smaller than S. 
Some possible code constructions are shown in Table |T] Their 


TABLE I 

Error-correcting Rewriting Codes Constructed erom pairs of 
CONJUGATE BCH AND EG-LDPC CODES. 


(m, jr,s,p) 

Cl [n, k, S] 

C2[n,k] 

Rewriting Rate 

(4,1,2,2) 

[255,247,3] 

[255,21] 

0.0510 

(3,1,2,2) 

[65,57,3] 

[65,13] 

0.1111 

(3,1,3,2) 

[511,484,7] 

[511,139] 

0.2192 

(3,1,4,2) 

[4095,4011,15] 

[4095,1377] 

0.3158 


encoding performance, with respect to the probability ^ that a 
cell in the state is writable, is shown in Pig. [3] Note from 
Fig. m that a code with smaller rewriting rate achieves a 
fixed failure rate at a smaller value of j6. In particular, the 
codes corresponding to the top three rows of Table |I] achieve 
very small failure rate at = 0.5, the point of practical 
interest. These results also show that the slope of the figures 
becomes sharper when the length of the codes increases, 
as expected. Out of the three codes that can be rewritten 
with fi = 0.5, Ceg( 3/1/3,2) poses the best rate and error- 
correction capability. 


(3, 1,2, 2) ■ (3, 1,3, 2) — 

(4, 1,2, 2) O _ (3, 1,4, 2) A 



Fig. 3. Encoding performance of the codes in Table U 

V. Alternative Approaches for Error Correction 

In this section we present two alternative approaches to 
combine rewriting codes with error correction. 

A. Concatenated Codes 

In this scheme, we concatenate an LDGM rewriting code 
with a systematic error-correcting code. The outer code is an 
LDGM rewriting code without error-correction capability, as 
in Section [HI] The systematic ECC is used as the inner code. 
The concatenated scheme is used in the second write. The 
scheme requires the first write to reserve some bits to store 
the redundancy of the ECC in the second write. 

In the second write, the encoder begins by hnding a vector 
X that can be written over the current state. After x is written, 
the systematic ECC calculates the redundancy bits required to 
protect X from errors. The redundancy bits are then written into 
the reserved cells. The decoding of the second write begins 
by recovering x using the systematic ECC and its redundancy 
bits. After x is recovered, the decoder of the rewriting code 
recovers the stored message from x. 

We note that reserving bits for the second write have a 
negative effect on the performance of the system, since it 

















reduces the total amount of information that could be stored 
in the memory on a given time. Therefore, the next subsection 
extends the concatenation scheme using a chaining technique, 
with the aim of reducing the number of bits required to be 
reserved for the second write. 

B. Code Chaining 

The chaining approach is inspired by a similar construction 
in polar coding ifThl . The idea is to chain several code blocks 
of short length. In the following we use a specific example to 
demonstrate the idea. We use a BCH code for error correction, 
since its performance can be easily calculated. We note, 
however, that LDPC codes may be used in practice, such that 
the circuit modules may be shared with the rewriting code, to 
reduce the required area. The performance of LDPC code in 
the considered parameters is similar to that of BCH codes. 

A typical BCH code used in flash memory has the parame¬ 
ters [8191,7671,81], where the length is 8191, the dimension 
is 7671, and the minimum distance is 81. If this code is used 
in a concatenated scheme for the second write, the first write 
needs to reserve 8191 — 7671 = 520 bits for redundancy. 

To reduce the amount of required reserved bits, we consider 
the chaining of 8 systematic BCH codes with the parameters 
[1023,863,33]. The encoding is performed sequentially, be¬ 
ginning with the rewriting encoding that finds a vector of 
863 bits. The vector xi represents a message wi of 310 bits, 
according to an [863,310]-LDGM rewriting code. Once x^ is 
found, the BCH encoder finds 1023 — 863 = 160 redundancy 
bits to protect Xi, as in the concatenated scheme. The encoder 
then “chains” the redundancy bits forward, by encoding them, 
together with 150 new information bits, into another block of 
863 bits, using the [863,310]-LDGM code. Let m 2 denote 
the vector of 310 bits encoded into the second block, m 2 
contains the 160 redundancy bits of Xi, together with the 
additional 150 information bits. Note that once m 2 is decoded, 
the redundancy bit of Xi are available, allowing the recovery 
Xi, and then m^. The encoding continues in this fashion 8 
times, to write over a total of 8 blocks, each containing 863 
cells. The 160 redundant bits used to protect ats are stored in 
the reserved cells. The decoding is done in the reverse order, 
where each decoded vector contains the redundancy bits of 
the previous block. 

C. Comparison 

We compare the different error-correction approaches, and 
discuss their trade-offs. The first code in the comparison is a 
conjugate code pair, described in Section IIVI We use a con¬ 
jugation of a [511,484,7]-BCH code containing a [511,372]- 
LDGM code, dual to the (3,1,3,2)-Euclidean geometry LDPC 
code in Table |T] The second code in the comparison is a 
concatenation of an outer [7671,2915]-LDGM Mackay rewrit¬ 
ing code with an inner [8191,7671,81]-BCH code. The third 
code is a chaining of 8 blocks of [863,310]-LDGM Mackay 
codes, each concatenated with a [1023,863,33]-BCH code. 
We compare the decoding BER Pp, the fraction a of bits 
required to be reserved, and the rewriting rate Rwom of the 
codes. The encoding failure rate of each of the three codes 
for = 0.5 is below 10“^. Pd is estimated with a standard 


TABLE II 

Error-correcting rewriting codes oe length 8200. 


Code 

Pd 

OC 

Rwom 

Conjugated 

10-3 

0% 

0.21 

Concatenated 

10-16 

6.3% 

0.35 

Chained 

10-16 

2% 

0.19 


flash memory assumption of a raw BER of 1.3 x 10-3. To 
achieve a comparable code length, the conjugated code is 
assumed to be used 16 times in parallel, with a total length of 
511 X 16 = 8176. The comparison is summarized in Table HIl 
Elash systems require Pd below lO-^®. We see in Table [B] 
that conjugated code still do not satisfy the reliability require¬ 
ment. We also see that concatenated codes that satisfy the 
reliability requirement need a large fraction of reserved space. 
The chained code reduces the fraction of reserved space to 
2%, with a rate penalty in the second write. 
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Appendix A 

Iterative Quantization Algorithm 

We denote Gq = {gi,---,gn) such that gj is the j-th 
column of Gq. 


Algorithm 2 u = Q(Gq,s'). 

1: V <— s' 

2 : while 3; such that vj ^ * do 

3: if 3z such that 3!j for which GQ{i,i) = 1 and Vj 7 ^ * 

then 

4: Push [i,]) into the Stack. 

5: Vj ^r- *. 

6 : else 

7: return FAILURE 

8 : end if 

9: end while 

10: U i — 

11: while Stack is not empty do 
12 : Pop [i,]) from the Stack. 

13: Ui ^ U- gj+ s'j 

14: end while 
15: return u 


Appendix B 
Proof of Theorem |5] 

Proof: As in Algorithm [T] let 2 = 

and s' = BEC(z,s). Now according to Algorithm [T] 
ENC(Gq, m,s) fails if and only if Q(Gq,s') fails. Ac¬ 
cording to lITSl Theorem 4], Q(Gq, s') fails if and only if 
ERASURE-DECODE(Gq,BEC( 2 ', s + l„)) fails. This com¬ 
pletes the proof. ■ 

Appendix C 
Proof of Theorem |6] 

Proof: We first show that Step 1 of Algorithm [T] runs in 
time 0{n) if is chosen in the following way. Eor any 

Gq, its parity check matrix H can be made in to systematic 
form, i.e., H = {P I), by row operations and permutation of 
columns. Then can be chosen as {O^xn-k 4)> ^nd so 

z = m). 

By ifTSl Theorem 5], Step 3 of Algorithm [T] runs in time 
0{nd). By the definition of d, the complexity of Step 7 is 
also 0{nd). Therefore 0{nd) dominates the computational 
cost of the algorithm. ■ 

Appendix D 
Proof of Corollary [T] 

Proof: Let s = s -|- !«. Then it follows from Theorem |5] 
that for all Gq, m G F^, z' G F^, 

Pr{ENC(GQ, m, s) = Failure} = 

Pr{ERASURE-DEC0DE(GQ,BEC(2',s)) = Failure}, 


where s is distributed i.i.d. with Pr{S; =} = fi. The right-hand 
side is the decoding-failure probability of an LDPC code with 
parity-check matrix Gq over a binary erasure channel, using 
message-passing decoding. The erasure probability of the 
channel is 1 — f, because Pr{s, = 1} = 1 — Pr{s; = 1}. The 
capacity of a binary erasure channel with erasure probability 
1 — f is f. This is also the capacity of the rewriting model. In 
addition, the rate of an LDPC code with parity-check matrix 
Gq is equal to the rate of a rewriting code constructed by 
the cosets of Gq. It is shown in iflTl how to construct a 
sequence of irregular LDPC codes that achieves the capacity of 
the binary erasure channel. Such sequence, used for rewriting 
codes, achieves the rewriting capacity. ■ 

Appendix E 

Handling Encoding Eailures 

The encoding failure event could be dealt with in several 
ways. A simple solution is to try writing on different invalid 
pages, if available, or to simply write into a fresh page, as 
current flash systems do. If the failure rate is small enough, 
say below 10 “^, the time penalty of rewriting failures would 
be small. For an alternative solution, we state a reformulation 
of Us] Theorem 3]. 

Proposition 10. For all m,m' G F^ and s G F^, 
ENC{Gq, m, s) fails if and only if ENC{Gq, m', s) fails. 

Proof: As in Algorithm [T] let 2 = m{El~^)'^ and 
s' = BEC( 2 ,s). Note that ENC(Gq, m,s) fails if and only if 
Q(Gq,s') fails. By Algorithm |2] the failure of Q(Gq,s') is 
determined only according to the locations of erasures in s', 
and does not depend on the values of the non-erased entries 
of s'. Since s' = BEC( 2 ,s), the locations of erasures in s' 
are only determined by the state s. This completes the proof. 

■ 

Proposition [TO] implies that whether a page is rewritable 
does not depend on the message to be written. This property 
suggests that the flash controller can check whether a page is 
rewritable right after it is being invalidated, without waiting 
for a message to arrive. An invalid page could be marked 
as ‘unrewritable’, such that data would be rewritten only 
into rewritable pages. This policy would guarantee that the 
rewriting of a new message always succeed. However, this 
policy also implies that the message passing algorithm would 
run more than once for the rewriting of a page. 







